Tutorial 29: Wikipedia image data

This tutorial introduces the wikiimage.py module, which we can use to grab and process image data from Wikipedia pages. Start by reading in the module, as well as numpy and pylab (for plotting the images).

In [1]:
%pylab inline

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

import wiki
import wikiimage
import wikitext
Populating the interactive namespace from numpy and matplotlib
In [2]:
plt.rcParams["figure.figsize"] = (12, 16)

Reading image data from Wikipedia

The image_data_frame takes a list of Wikipedia pages and returns a data frame object showing all of the images from the page. You can also supply the minimum and maximum allowed sizes of images. By default the function will download a local version of any images you do not yet have locally.

In [3]:
df = wikiimage.image_data_frame(['Paris', 'London'], min_size=300)
df
Pulling image from MediaWiki: 'parisiicoins.jpg'
Pulling image from MediaWiki: 'bibliothcaquenationaledefrancecparissiterichelieu-salleovale.jpg'
Pulling image from MediaWiki: 'placedelarcapubliquechcunefoulesilencieuse.jpg'
Pulling image from MediaWiki: 'conseildetatpariswa.jpg'
Pulling image from MediaWiki: 'mignard-autoportrait.jpg'
Pulling image from MediaWiki: 'parisjuly-a.jpg'
Pulling image from MediaWiki: 'themuscaedorsayatsunsetcparisjuly.jpg'
Pulling image from MediaWiki: 'londonmontagel.jpg'
Pulling image from MediaWiki: 'englandsoutheastlocationmap.svg.png'
Pulling image from MediaWiki: 'unitedkingdomadmlocationmap.svg.png'
Pulling image from MediaWiki: 'londonthamessunsetpanorama-feb.jpg'
Pulling image from MediaWiki: 'neasdentemple-shreeswaminarayanhindumandir-gate.jpg'
Out[3]:
page img max_size img_links
0 Paris parisiicoins.jpg 330 https://upload.wikimedia.org/wikipedia/commons...
1 Paris bibliothcaquenationaledefrancecparissiterichel... 300 https://upload.wikimedia.org/wikipedia/commons...
2 Paris placedelarcapubliquechcunefoulesilencieuse.jpg 350 https://upload.wikimedia.org/wikipedia/commons...
3 Paris conseildetatpariswa.jpg 330 https://upload.wikimedia.org/wikipedia/commons...
4 Paris mignard-autoportrait.jpg 304 https://upload.wikimedia.org/wikipedia/commons...
5 Paris parisjuly-a.jpg 330 https://upload.wikimedia.org/wikipedia/commons...
6 Paris themuscaedorsayatsunsetcparisjuly.jpg 330 https://upload.wikimedia.org/wikipedia/commons...
7 London londonmontagel.jpg 415 https://upload.wikimedia.org/wikipedia/commons...
8 London englandsoutheastlocationmap.svg.png 371 https://upload.wikimedia.org/wikipedia/commons...
9 London unitedkingdomadmlocationmap.svg.png 386 https://upload.wikimedia.org/wikipedia/commons...
10 London londonthamessunsetpanorama-feb.jpg 300 https://upload.wikimedia.org/wikipedia/commons...
11 London neasdentemple-shreeswaminarayanhindumandir-gat... 340 https://upload.wikimedia.org/wikipedia/commons...

Note that the returned results include the page name, the path of the image, as well as a column called "max_size". The latter column gives the size of the largest dimension of the image (either the height or width).

Displaying the images in Python

The load_image function takes the name of an image and returns a PIL object, a special image type that can be plotted in Python.

In [4]:
img = wikiimage.load_image(df.img.values[4])
type(img)
Using TensorFlow backend.
Out[4]:
PIL.JpegImagePlugin.JpegImageFile
In [5]:
plt.imshow(img)
Out[5]:
<matplotlib.image.AxesImage at 0xb3aefef98>

Here is some Python code that prints all of the image in the data frame. Note that you may need to modify the line plt.subplot(4, 3, ind + 1) if you change the data. The 4 gives the number of columns in the plot and the 3 gives the number of rows. If you have more than 12 images, only the first 12 will be shown. You can also adjust the plt.rcParams["figure.figsize"] = (12, 16) above to change the overall size of the print out (I find that I need to adjust this depending on my screen and the images in question).

In [6]:
for ind, idx in enumerate(range(df.shape[0])):
    try:
        plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
        plt.subplot(4, 3, ind + 1)

        img = wikiimage.load_image(df.iloc[idx]['img'])
        plt.imshow(img)
        plt.axis("off")
        
    except:
        pass

Image embedding

Last time we saw how the VGG19 model takes a 224-by-224 dimensional image and returns a list of 1000 probabilities giving predictions of what objects are located in the image. Here's the model once again:

In [7]:
from keras.applications.vgg19 import VGG19
vgg19_full = VGG19(weights='imagenet')
vgg19_full.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________

The VGG19 model as described here is really only useful if we care about the 1000 categories described in the ILSVRC competition. Why would this be important enough to include in the keras module? In and of itself, it really is not. The reason the model is so important is due to something called transfer learning.

It turns out that if we apply only a subset of the layers, say all but the final layer of the model, the neural network serves as form of dimensionality reduction. Look at the model above; if we look at the output of the layer fc2 this serves to project a 224 * 224 * 3, or 150,528 dimensional object, into 4096 dimensional space. To produce such an embedding, I'll use keras to strip off the second to last layer:

In [8]:
from keras.models import Model

vgg_fc2 = Model(inputs=vgg19_full.input, outputs=vgg19_full.get_layer('fc2').output)
vgg_fc2.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
=================================================================
Total params: 139,570,240
Trainable params: 139,570,240
Non-trainable params: 0
_________________________________________________________________

And we can apply the model just as we did before, but the output now contains 4096 dimensions. These dimensions, just like with PCA and t-SNE, do not have an explict meaning. The relationships between images in the embedding space, however, describe semantic relationships, which we will be able to explore shortly.

In [9]:
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input

img = wikiimage.load_image(df.img.values[1], target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

y = vgg_fc2.predict(x)
y.shape
Out[9]:
(1, 4096)

Embeddings in wikiimage

The wikiimage module contains the function vgg19_embed that performs embedding into the fc2 layer. Conveniently, the embedding are cached so that you only need to construct them once (it can take a while to create the embeddings).

In [10]:
df_fc2 = wikiimage.vgg19_embed(df.img.values)
df_fc2.shape
1/1 [==============================] - 1s 743ms/step
1/1 [==============================] - 1s 576ms/step
1/1 [==============================] - 1s 568ms/step
1/1 [==============================] - 1s 549ms/step
1/1 [==============================] - 1s 582ms/step
1/1 [==============================] - 1s 597ms/step
1/1 [==============================] - 1s 600ms/step
1/1 [==============================] - 1s 610ms/step
1/1 [==============================] - 1s 604ms/step
1/1 [==============================] - 1s 610ms/step
1/1 [==============================] - 1s 570ms/step
1/1 [==============================] - 1s 554ms/step
Out[10]:
(12, 4096)

The output is a numpy array with one row for each image and 4096 columns. Again, we will see how to use these in just a moment.

Bulk download

As with the Wikipedia pages at the start of the semester, I do not want you to all have to wait a long time to download the images for today's class. Conveniently, we should be able to use the same bulk download function if we are clever about calling the "language" of the images "img" and the "language" of the embeddings "embed". Grab both of these here:

In [11]:
wiki.bulk_download('impressionists-text', lang='en')
Added 1 files from an archive of 922 files.
Out[11]:
1
In [12]:
wiki.bulk_download('impressionists-image', lang='img')
Added 3011 files from an archive of 3023 files.
Out[12]:
3011
In [13]:
wiki.bulk_download('impressionists-embed', lang='embed')
Added 3013 files from an archive of 3025 files.
Out[13]:
3013

Exploring impressionists

For today's tutorial, let's create a dataset of all the pages linked to from the impressionists and extract from these all of the images. Note: you should have almost all of these from the bulk download above. If it starts downloading a lot of stuff, something is wrong!

In [14]:
page_links = wikitext.get_internal_links("Impressionism")['ilinks'] + ["Impressionism"]
df = wikiimage.image_data_frame(page_links, download=True, min_size=224, max_size=750)
df
Pulling image from MediaWiki: 'courbetcolonne.png'
Pulling image from MediaWiki: 'marcelduchampmonalisalhooq.jpg'
Pulling image from MediaWiki: 'dariushborborcsouth-eastviewchousenahavandictehranciranc.jpg'
Pulling image from MediaWiki: 'yvangollcsurrcaalismecmanifestedusurrcaalismecvolumecnumbercoctobercccoverbyrobertdelaunay.jpg'
Out[14]:
page img max_size img_links
0 A_Sunday_Afternoon_on_the_Island_of_La_Grande_... georgesseurat-asundayonlagrandejatte---googlea... 350 https://upload.wikimedia.org/wikipedia/commons...
1 Aaron_Copland nadiaboulanger.jpg 275 https://upload.wikimedia.org/wikipedia/commons...
2 Aaron_Copland alfredstieglitz.jpg 277 https://upload.wikimedia.org/wikipedia/commons...
3 Aaron_Copland carloschavez.jpg 281 https://upload.wikimedia.org/wikipedia/commons...
4 Aaron_Copland aaroncoplandhouseccortlandtmanorcny.jpg 250 https://upload.wikimedia.org/wikipedia/commons...
5 Aaron_Copland victorkraftc.jpg 281 https://upload.wikimedia.org/wikipedia/commons...
6 Aaron_Copland golden-muse-muralfanfare-for-the-common-mancin... 333 https://upload.wikimedia.org/wikipedia/commons...
7 Abel_Gance ganceabel-xb-.jpg 274 https://upload.wikimedia.org/wikipedia/commons...
8 Abel_Gance amourtragiquedemonalisa-ccandidodefaria--eye-e... 294 https://upload.wikimedia.org/wikipedia/commons...
9 Abstract_expressionism no.c.jpg 447 https://upload.wikimedia.org/wikipedia/en/thum...
10 Abstract_expressionism newman-onement.jpg 288 https://upload.wikimedia.org/wikipedia/en/thum...
11 Abstract_expressionism boonoiloncanvaspaintingbyjamesbrookscctategall... 229 https://upload.wikimedia.org/wikipedia/en/thum...
12 Abstract_expressionism newman-whosafraidofredcyellowandblue.jpg 266 https://upload.wikimedia.org/wikipedia/commons...
13 Académie_des_Beaux-Arts institutdefrance-acadcamiefrancaaiseetpontdesa... 300 https://upload.wikimedia.org/wikipedia/commons...
14 Adolf_Loos adolfloos..jpg 365 https://upload.wikimedia.org/wikipedia/commons...
15 Afternoon_of_a_Faun_(Nijinsky) bakstnizhinsky.jpg 275 https://upload.wikimedia.org/wikipedia/commons...
16 Afternoon_of_a_Faun_(Nijinsky) helenmenelauslouvregfull.jpg 232 https://upload.wikimedia.org/wikipedia/commons...
17 Afternoon_of_a_Faun_(Nijinsky) nijinskyfauncarryingscarfbarondemeyer.jpg 364 https://upload.wikimedia.org/wikipedia/en/thum...
18 Afternoon_of_a_Faun_(Nijinsky) laprcas-mididunfaunebyl.bakst.jpg 280 https://upload.wikimedia.org/wikipedia/commons...
19 Afternoon_of_a_Faun_(Nijinsky) laprcas-mididunfaunebyl.bakst.jpg 270 https://upload.wikimedia.org/wikipedia/commons...
20 Afternoon_of_a_Faun_(Nijinsky) nijinskyfaunandnymphentwined.jpg 265 https://upload.wikimedia.org/wikipedia/en/thum...
21 Alban_Berg wpalbanberg.jpg 244 https://upload.wikimedia.org/wikipedia/commons...
22 Alban_Berg albanbergbuesteschiefling.jpg 331 https://upload.wikimedia.org/wikipedia/commons...
23 Albert_Aurier auriercalbertcbnfgallica.jpg 281 https://upload.wikimedia.org/wikipedia/commons...
24 Albert_Gleizes albertgleizescc.cphotographbypierrechoumoff..jpg 297 https://upload.wikimedia.org/wikipedia/commons...
25 Albert_Gleizes albertgleizesccbordsdelamarnecoiloncanvascxcmc... 290 https://upload.wikimedia.org/wikipedia/en/thum...
26 Albert_Gleizes lbertgleizesccfemmeauxphloxcoiloncanvascxcmcex... 290 https://upload.wikimedia.org/wikipedia/en/thum...
27 Albert_Gleizes albertgleizesccportraitdejacquesnayralcoilonca... 368 https://upload.wikimedia.org/wikipedia/en/thum...
28 Albert_Gleizes albertgleizescclesbaigneusescoiloncanvascxcmcp... 350 https://upload.wikimedia.org/wikipedia/en/thum...
29 Albert_Gleizes albertgleizescexposicicbdartcubistacgaleriesda... 385 https://upload.wikimedia.org/wikipedia/en/thum...
... ... ... ... ...
3413 The_Doge's_Palace_Seen_from_San_Giorgio_Maggio... thedogespalaceseenfromsangiorgiomaggioremetdt.jpg 300 https://upload.wikimedia.org/wikipedia/commons...
3414 The_Magpie_(Monet) claudemonet-themagpie-googleartproject.jpg 300 https://upload.wikimedia.org/wikipedia/commons...
3415 The_Magpie_(Monet) theluncheonbyclaudemonet-stcadel-frankfurtamma... 337 https://upload.wikimedia.org/wikipedia/commons...
3416 The_Phillips_Collection locationmapwashingtondcclevelandparktosouthwes... 225 https://upload.wikimedia.org/wikipedia/commons...
3417 The_Phillips_Collection pierre-augusterenoir-luncheonoftheboatingparty... 300 https://upload.wikimedia.org/wikipedia/commons...
3418 The_Phillips_Collection elgreco-therepentantst.peter-googleartproject.jpg 274 https://upload.wikimedia.org/wikipedia/commons...
3419 The_Plum edouardmanet-theplum-nationalgalleryofart.jpg 449 https://upload.wikimedia.org/wikipedia/commons...
3420 The_Valley_of_the_Nervia claudemonet-thevalleyofthenervia.jpg 300 https://upload.wikimedia.org/wikipedia/commons...
3421 Theodore_Earl_Butler portraitofbutler.jpg 306 https://upload.wikimedia.org/wikipedia/commons...
3422 Theodore_Earl_Butler entrancetothegardengate.jpg 276 https://upload.wikimedia.org/wikipedia/en/thum...
3423 Theodore_Earl_Butler theodoreearlbutlerflags.jpg 260 https://upload.wikimedia.org/wikipedia/commons...
3424 Viking_art osebergshipheadpost.jpg 244 https://upload.wikimedia.org/wikipedia/commons...
3425 Viking_art viking-whale-boneplaque-walters.jpg 231 https://upload.wikimedia.org/wikipedia/commons...
3426 Viking_art britishmuseumpenrithhoardbrooches.jpg 274 https://upload.wikimedia.org/wikipedia/commons...
3427 Viking_art kunststilederwikingerzeit.jpg 360 https://upload.wikimedia.org/wikipedia/commons...
3428 Viking_art upplandsruninskrift.jpg 227 https://upload.wikimedia.org/wikipedia/commons...
3429 Viking_art urnesportalen.jpg 299 https://upload.wikimedia.org/wikipedia/commons...
3430 Water_Lilies_(1919) wlametmuseumwaterliliesbyclaudemonet.jpg 300 https://upload.wikimedia.org/wikipedia/commons...
3431 Water_Lilies_(Monet_series) claudemonet-thewaterlilies-theclouds-googleart... 400 https://upload.wikimedia.org/wikipedia/commons...
3432 Water_Lilies_(Monet_series) claudemonet-thewaterlilies-settingsun-googlear... 400 https://upload.wikimedia.org/wikipedia/commons...
3433 Water_Lilies_(Monet_series) wlamomareflectionsofcloudsonthewater-lilypondm... 400 https://upload.wikimedia.org/wikipedia/commons...
3434 Wayback_Machine waybackmachinelogo.svg.png 250 https://upload.wikimedia.org/wikipedia/commons...
3435 Wayback_Machine waybackmachinehomepagenovember.png 300 https://upload.wikimedia.org/wikipedia/en/thum...
3436 Williamstown,_Massachusetts berkshirecountymassachusettsincorporatedanduni... 446 https://upload.wikimedia.org/wikipedia/commons...
3437 Williamstown,_Massachusetts thompsonmemorialchapelcwilliamscollege.jpg 299 https://upload.wikimedia.org/wikipedia/commons...
3438 Williamstown,_Massachusetts springstreetcwilliamstownma.jpg 275 https://upload.wikimedia.org/wikipedia/commons...
3439 Women_in_the_Garden claudemonet.jpg 369 https://upload.wikimedia.org/wikipedia/commons...
3440 Impressionism claudemonetcimpressioncsoleillevant.jpg 340 https://upload.wikimedia.org/wikipedia/commons...
3441 Impressionism jamesabbotmcneillwhistler.jpg 228 https://upload.wikimedia.org/wikipedia/commons...
3442 Impressionism serovdevochkaspersikami.jpg 248 https://upload.wikimedia.org/wikipedia/commons...

3443 rows × 4 columns

Next, let's grab the VGG19 embeddings for these images. This may take a minute or two, there is a lot to load, but should finish quickly as almost all of the embeddings should already have been downloaded.

In [15]:
wikiart_fc2 = wikiimage.vgg19_embed(df.img.values)
wikiart_fc2.shape
Out[15]:
(3443, 4096)

Now, finally, let's see why these embeddings are so useful. Let's start with the image 700:

In [16]:
start_img = 700
img = wikiimage.load_image(df.iloc[start_img]['img'])
plt.imshow(img)
Out[16]:
<matplotlib.image.AxesImage at 0xb3d699208>

We can compute the distance in the 4096-dimensional embedding space of this image to all of the other images in our corpus.

In [17]:
dists = np.sum(np.abs(wikiart_fc2 - wikiart_fc2[start_img, :]), 1)
dists.shape
Out[17]:
(3443,)

Then, we'll sort these distances and get the indicies of all of the 24 closest images in this space (of course, the closest image will be image 700 itself).

In [18]:
idx = np.argsort(dists.flatten())[:24]
idx
Out[18]:
array([ 700,  745, 1054,  147,  554, 3037, 2384,  755,  509,  775, 1640,
       2351,  407,  857, 3038, 1692, 1877,  309, 2110,  128, 1512, 1677,
         23, 1398])

Finally, let's see all of the images in order from closest to farthest:

In [19]:
plt.figure(figsize=(14, 36))
for ind, i in enumerate(idx):
    try:
        plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
        plt.subplot(8, 3, ind + 1)

        img = wikiimage.load_image(df.iloc[i]['img'])
        plt.imshow(img)
        plt.axis("off")
        
    except:
        pass

Fairly accurate, when you consider all of the image types in the corpus, no?

Testing the embeddings

The code below picks a randoming starting point and displays the closest 24 images in the fc2 space. Run it multiple times, and record particularly interesting numbers. Where does it work well and where does it run into problems? Tell me about at least one number that worked better than you expected and one issue that it had trouble dealing with:

Answer:

In [21]:
start_img = np.random.randint(0, df.shape[0])

print("Grabbed image number {0:d}.".format(start_img))
print(df.iloc[start_img])

dists = np.sum(np.abs(wikiart_fc2 - wikiart_fc2[start_img, :]), 1)
idx = np.argsort(dists.flatten())[:24]
plt.figure(figsize=(14, 36))

for ind, i in enumerate(idx):
    try:
        plt.subplots_adjust(left=0, right=1, bottom=0, top=1)
        plt.subplot(8, 3, ind + 1)

        img = wikiimage.load_image(df.iloc[i]['img'])
        plt.imshow(img)
        plt.axis("off")
        
    except:
        pass
Grabbed image number 392.
page                                                   Belgium
img                                       blueeurozone.svg.png
max_size                                                   235
img_links    https://upload.wikimedia.org/wikipedia/commons...
Name: 392, dtype: object